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Abstract 

Biometrics make human identification possible with a sample of a bio- 
metric trait and an associated database. Classical identification tech- 
niques lead to privacy concerns. This paper introduces a new method to 
identify someone using his biometrics in an encrypted way. 

Our construction combines Bloom Filters with Storage and Locality- 
Sensitive Hashing. We apply this error-tolerant scheme, in a Hamming 
space, to achieve biometric identification in an efficient way. This is the 
first non-trivial identification scheme dealing with fuzziness and encrypted 
data. 
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1 Introduction 

The arising of biometric recognition systems is based on the uniqueness of some 
natural information every human being carries along. For instance, it is possible 
to verify that a given individual is the one he claims to be ( Verification) . It 
is also possible to find someone's identity among a collection thanks to his 
biometrics {Identification). 

In this paper, we design a biometric identification system that is based on 
encrypted data, so that privacy is guaranteed, and in a way that does not take 
too much time and memory to process. For that purpose, we need to find a way 
to: 

• mitigate the effects of biometrics fuzziness, 

• and efficiently identify someone over an encrypted database. 

It follows the idea of searchable encryption and we here explain how to make 
efficient queries to the database, that look for a pattern close to a given one in 
encrypted data, i.e. a search with error-tolerance. 

*An extended abstract - entitled "Error-Tolerant Searchable Encryption" — of this work 
has been accepted to and will be presented at the Communication and Information Systems 
Security Symposium, International Conference on Communications (ICC) 2009, June 14-18, 
Dresden, Germany. This paper "Identification with Encrypted Biometric Data" is the full 
version of our work. 

^This work was partially supported by the french ANR RNRT project BACH. 



1.1 Related Works and Motivation 



Security of biomctric systems is widely studied - cf. [3T1 [331 H] ~ ^Lud although a 
lot of vulnerabilities are now well understood and controlled, it is still difficult 
to achieve an end-to-end system which satisfies all constraints. In particular, 
biometric template privacy is an important issue due to the non-revocability 
and non-renewability of biometric features. 

1.1.1 Biometrics and Cryptography 

A specific difficulty concerning biometrics is their fuzzincss. It is nearly impos- 
sible for a sensor to obtain the same image from a biometric data twice: there 
will always be significant differences. The classical way to supersede variations 
between different captures is to use a matching function, which basically tells if 
two measures represent the same biometric data or not. 

The integration of biometrics into cryptographic protocols is thus difficult 
as state-of-the-art protocols are not designed for error-tolerance and fuzziness 
in their inputs. The two main leads for that are achieving a good stable coding 
of the data or making the matching algorithm part of the protocol. 

Both sides of the problem are quite hard. The extraction of a constant-length 
vector has been studied for the iris 18J and the fingerprint fS?, ^48]; the result 
is a fixed-length bit string on which the matching is realized with the Hamming 
distance. Following this, we solely focus in this paper on binary biometric data 
compared with Hamming distance. 

Most of protocols involving biometric data and cryptography use Secure 
Sketches or Fuzzy Extractors [19l [34] . It uses error correction to reduce varia- 
tions between the different measures, and to somehow hide the biometric data 
behind a random codeword - e.g. [46, 39, 271I1|11[II]. 

On the other hand, several biometrics verification protocols, e.g. |141I101[T21 
l44l [47] , have proposed to embed the matching directly. They use the property of 
homomorphic encryption schemes to compute the Hamming distance between 
two encrypted templates. Some other interesting solutions based on adaptation 
of known cryptographic protocols are also investigated in [7j [13] . 

The drawback with all these techniques is that they do not fit well with 
identification in large databases as the way to run an identification among N 
data would be to run almost as many authentication algorithms. As far as we 
know, no non-trivial protocol for biometric identification involving privacy and 
confidentiality features has been proposed yet. 

1.1.2 Identification 

Several algorithms have been proposed for the so-called Nearest Neighbour and 
Approximate Nearest Neighbour (ANN) problems. Indyk wrote a review on 
these topics in 29 . Recently, Hao et al. [26j demonstrated the efficiency of the 
ANN approach for iris biometrics where projected values of iris templates are 
used to speed up identification requests into a large database; indeed [55] derived 
a specific ANN algorithm from the iris structure and statistical properties. 



However, in their construction the iris biometric data are never encrypted, and 
the way they boost the search for the nearest match reveals a large amount of 
information about sensitive data. 

Our works are also influenced by the problem of finding a match on encrypted 
data. Boneh et al. defined the notion of Public-key encryption with Keyword 
Search (PEKS) 0, in which specific trapdoors are created for the lookup of 
keywords over public-key encrypted messages. Several other papers, e.g. [24l 
[21 HSl [3S1 US], have also elaborated solutions in this field. However the main 
difference between the search for a keyword as understood by Boneh et al. [H [B] 
and biometric matching is that an exact match for a given bit string in the 
plaintext suffices for the former, but not for our motivation. For this purpose, we 
introduce a new model for error-tolerant search in Sec. [3] and specific functions 
to take into account fuzziness in Sec. 14.11 

The most significant difference here from the primitives introduced previ- 
ously in [5] is that messages are no longer associated to keywords. Moreover, 
our primitives enable some imprecision on the message that is looked up. For 
example, one can imagine a mailing application, where all the mails are en- 
crypted, and where it is possible to make queries on the mail subject. If there is 
a typo in the query, then looking for the correct word should also give the mail 
among the results - at least, we would like that to happen. Note that wildcards 
are not well-adapted to this kind of application, as a wildcard permits to catch 
errors providing that we know where it is located, whereas error-tolerance does 
not have this constraint. 

1.2 Construction Outline 

We propose to use recent advances done in the fields of similarity searching 
and public-key cryptography. Our technique narrows our identification to a few 
candidates. In a further step, we must complete it by fine-tuning the results 
in checking the remaining identities so that the identification request gets a 
definite answer. 

The first step is accomplished by combining Bloom filters with locality- 
sensitive hashing functions. Bloom filters enable to speed up the search for 
a specified keyword using a time-space trade-off. We use locality-sensitive hash- 
ing functions to speed the search for the (approximate-)nearest neighbour of an 
element in a reference set. Combining these primitives enables to efficiently use 
cryptographic methods on biometric templates, and to achieve error-tolerant 
searchable encryption. 

1.3 Organization 

In Section [2] we describe the biometric identification architecture that we con- 
sider and explain our security objectives to reach. Section [3] introduces the 
security model for the cryptographic primitives that we use, based on the new 
concept of Error- Tolerant Searchable Encryption. We introduce the different 



functions used for our proposition in Section [H We give in Section [5] a step- 
by-step construction of an error-tolerant searchable scheme, together with its 
security analysis. Application to biometric identification is explained in Sec- 
tion [H] and Section 16.21 gives a practical illustration with IrisCodes. Section [7] 
concludes. 

An additional property of symmetric privacy is analyzed in Appendix El 

2 Architecture for Biometric Identification 

2.1 Introduction to Biometric Identification 

For a given biometrics technology, such as the fingerprint or the iris, let B be 
the set of all possible corresponding biometric features - i.e. data which are 
captured by biometric sensors. For biometric recognition, a matching algorithm 
m : B X B ^ R is used to compute a dissimilarity score between two data. Its 
goal is to differentiate similar data from different ones: 

Definition 1 A biometric template b E B is the result of a measurement from 
someone's biometrics thanks to a sensor. For a specific user whose biometrics 
is (3, we note 6 <— the fact that b is a measure of [3. 

Two different measures of the same user b,b' ^ (3 have with high probability 
a small score m{b,b'); measures of different users 6i ^ /3i, 62 [^2 have a 
large value 771(61,62)- 

In practice, some thresholds Xmin, ^max are chosen and the score is consid- 
ered as small (resp. large) if it is less (resp. greater) than the threshold Xmin 
(resp. Xmax)- This score is usually enough to determine with some precision if 
two measures correspond to the same user or not. Errors, called False Reject 
and False Acceptance, are possible but this problem is outside the scope of our 
paper. 

In the following, we restrict ourselves to i? = {0, 1}^ equipped with the 
Hamming distance d. A biometric template 6 G i? is the result of a measurement 
from someone's biometrics thanks to a sensor. Two different measures b, b' of 
the same user U are with high probability at a Hamming distance d{b, b') < Xmin 
; measures 61, 62 of different users Ui,U2 are at a Hamming distance (i(6i, 62) > 
Xmax- In this case, the matching algorithm m simply consists in evaluating the 
Hamming distance. 

Remark 1 For instance, iris biometric features are binary vectors of length 
2048 when coded as IrisCodes following \18}j . In this case of IrisCode US}/ , the 
matching algorithm m is related to the computation of a Hamming distance 
between two IrisCodes. 

A biometric identification system — also called a one-to-many biometric sys- 
tem - recognizes a person among a collection of templates. A system is given 
by a reference data set D C B and a identification function \d : B ^ ^(-D). On 
input bnew , the system outputs a subset C oi D containing biometric templates 



bref G D such that the matching score between bnew and href is smah. This 
means that bnew and bref possibly corresponds to the same person. C is the 
if no such template can be found; the size of C depends on the accuracy of the 
system. With pseudo-identities (cither real identities of persons or pseudonyms) 
registered together with the reference templates in D, the set C gives a list of 
candidates for the pseudo-identity of the person associated to bnew 

2.2 Architecture 

Our general model for biometric identification relies on the following entities. 

• Human users Ui : a set of N users arc registered thanks to a sample of their 
biometrics /3j and pseudo- identities ID^, more human users Uj {j > N) 
represent possible impostors with biometrics (ij. 

• Sensor client SC: a device that extracts the biometric template from 0i. 

• Identity Provider JV: replies to queries sent by SC by providing an iden- 
tity, 

• Database T>B: stores the biometric data. 

Remark 2 Here the sensor client is a client which captures the raw image of 
a biometric data and extracts its characteristics to output a so-called biometric 
template. Consequently, we assume that the sensor client is always honest and 
trusted by all other components. Indeed, as biometrics are public information, 
additional credentials are always required to establish security links in order to 
prevent some well-known attacks (e.g. replay attacks) and to ensure that, with 
a high probability, the biometric template captured by the sensor and used in 
the system is from a living hum,a,n user. In other words, we assume that it is 
difficult to produce a fake biometric template that can be accepted by the sensor. 

In an identification system, we have two main services: 

1. Enrolment registers users thanks to their physiological characteristics (for 
a user W,, it requires a biometric sample bi <— f3i and its identity IDi) 

2. Identification answers to a request by returning a subset of the data which 

was registered 

The enrolment service can be run each time a new user has to be registered. 
Depending on the application, the identification service can output either the 
identity of the candidates or their reference templates. 

As protection against outsiders, such as eavesdroppers, can be achieved with 
classical cryptographic techniques, our main objective is the protection of the 
data against insiders. In particular we assume that no attacker is able to inter- 
fere with these communications. 



2.3 Informal Objectives 

We here formulate the properties we would like to achieve in order to meet good 
privacy standards. 

Condition 1 When the biometric identification system is dealing with the iden- 
tification of a template b coming from the registered userUi with identity IDi, it 
should return a subset containing a reference to {IDi,bi) except for a negligible 
probability. 

Condition 2 When the system is dealing with the identification of a template 
b coming from an unregistered user, it should return the empty set except for 
a negligible probability. 

We do not want a malicious database to be able to link an identity to a 
biometric template, nor to be able to make relations between different identities. 

Condition 3 The database VB should not be able to distinguish two enrolled 
biometric data. 

Another desired property is the fact that the database knows nothing of the 
identity of the user who goes through the identification process, for example, to 
avoid unwanted statistics. 

Condition 4 The database VB should not be able to guess which identification 
request is executed. 

3 Security Model for Error- Tolerant Searchable 
Encryption 

In this section, we describe a formal model for an error-tolerant searchable 
encryption protocol. A specific construction fitting in this model is described in 
Section [5l This scheme enables to approximately search and retrieve a message 
stored in a database, i.e. with some error-tolerance on the request. This is 
in fact a problem quite close to biometric identification and the corresponding 
cryptographic primitives are thus used in our system, cf. Section [6l 

In the sequel, we note {m, . . . , n} the set of all integers between m and n 
(inclusive). 

3.1 Entities for the Protocol 

Our primitive models the interactions between users that store and retrieve 
information, and a remote server. We distinguish the user who stores the data 
from the one who wants to get it. This leads to three entities: 

• The server S: a remote storage system. As the server is untrusted, we 
consider the content to be public. Communications to and from this server 
are also subject to eavesdropping. 



• The sender X incrementally creates the database, by sending data to 5, 

• The receiver y makes queries to the server S. 

In a latter part (Sec. [5]), we integrate our cryptographic protocols into our 
biometric identification system. This doing, we merge the entities defined in 
Sec. 12.21 and those just previously introduced. 

We emphasize that X and y are not necessarily the same user, as X has full 
knowledge of the database he created whereas y knows only what he receives 
from iS. 

3.2 Definition of the Primitives 

In the sequel, messages are binary strings of a fixed length N, and d(xi, X2) the 
Hamming Distance between xi,X2 S {0, 1}^ is the canonical distance, i.e. the 
number of positions in {1, . . . , iV} in which xi and X2 differ. 

Here comes a formal definition of the primitives that enable to perform an 
error-tolerant searchable encryption; this definition cannot be parted from the 
definition of Completeness(Amin) and e-Soundness(A,„aa;), which follows. 

Definition 2 A (e, Amm, Amaa:)-Public Key Error- Tolerant Searchable Encryp- 
tion is obtained with the following probabilistic polynomial-time methods: 

• KeyGen(l'^) initializes the system, and outputs public and private keys 
{pk,sk); k is the security parameter. The public key pk is used to store 
data on a server, and the secret key sk is used to retrieve information 
from that server. 

• Sendx.s{x,pk) is a protocol in which X sends to S the data x G {0,1}^ 
to be stored on the .storage system. At the end of the protocol, S associated 
an identifier to x, noted (p{x). 

• Retnevey^s{x' , sk) is a protocol in which, given a fresh data x' € {0, 1}^, 
y a.sks for the identifiers of all data that are stored on S and are close to 
x' , with Completeness (Xmin) and e-Soundness(Xmax)- This outputs a set 
of identifiers, noted <^(x'). 

These definitions are comforted by the condition [5] of Section 15751 that defines 
Completeness and e-Soundness for the parameters already introduced in Section 
12.11 Amin, Xmax- In a few words, Completeness implies that a registered message 
x is indeed found if the query word x' is at a distance less than Xmin from x, 
while e-Soundness means that with probability greater than 1 — e, no message 
at a distance greater than Xmax from x' will be returned. 

The Send protocol produces an output <p(a;) that identifies the data x. This 
output '^{x) is meant to be a unique identifier, which is a binary string of 
undetermined length - in other words, elements of {0, 1}* - that enables to 
retrieve x. It can be a timestamp, a name or nickname, etc. depending on the 
application. 



3.3 Security Requirements 



First of all, it is important that the scheme actually works, i.e. that the retrieval 
of a message near a registered one gives the correct result. This can be formalized 
into the following condition: 

Condition 5 (Completeness(Ami„), e-Soundness(ATOQx)) Letx\,...,Xp G 

B = {0,1}^ be p different binary vectors, and let x' E B be another binary 
vector. Suppose that the system was initialized, that all the messages Xi have 
been sent by user X to the system S with identifiers ip{xi), and that user y 
retrieved the set of identifiers ^{x') associated to x' . 

1. The scheme is said to be complete if the identifiers of all the Xi that are 
near x' are almost all in the resulting set i.e. if 

T]c = Pv[3i s.t. d{x',Xi) < Xmin and (p{xi) ^ ^{x')] 

x' 

is negligible. 

2. The scheme is said to be e-sound if the probability of finding an unwanted 
result in $(a;'), i.e. 

r]s = Pr [3i € {1,. .. ,p} s.t. d{x' ,Xi) > Xmax and ip{xi) e '^{x')] , 

x' 

is bounded by e. 

The first condition simply means that registered data is effectively retrieved 
if the input is close. r]c expresses the probability of failure of this Retrieve 
operation. 

The second condition means that only the close messages are retrieved, thus 
limiting false alarms, rjs measures the reliability of the Retrieve query, i.e. if all 
the results arc identifiers of messages near to x' . 

These two properties (Completeness and e- Soundness) are sufficient to have 
a working set of primitives which allows to make approximate queries on a 
remote storage server. The following conditions, namely Sender Privacy and 
Receiver Privacy, ensure that the data stored in the server is secure, and that 
communications can be done on an untrusted network. 

Condition 6 (Sender Privacy) The scheme is said to respect Sender Privacy 

if the advantage of any malicious server is negligible in the Exp^'^"'''^'^ Privacy 
periment, described below. Here, A is an "honest-but-curious" opponent taking 
the place of S, and C is a challenger at the user side. 

E Sender Privacy 

1. {pk,sk) ^ KeyGen(l'=) (C) 

2. {x2,...,xn} <- A (A) 

3. ifiixi) <— Send x,s{xi,pk) (C) 

4. {xo,xi} <- A (A) 

5. ifi{xe) <— Send x,s{xe,pk), (C) 

e €r {0, 1} 

6. Repeat steps (2, 3) 

7. e' € {0, 1} ^ A (A) 



The advantage of the adversary is \ Pr [e' ~ e] 



Informally, in a first step, the adversary receives Send requests that he chose 
himself; A then looks for a couple (xq, xi) of messages on which he should have 
an advantage. C chooses one of the two messages, and the adversary must guess, 
by receiving the Send requests, which one of xq or xi it was. 

This condition permits to have privacy on the content stored on the server. 
The content that the sender transmits is protected, justifying the title "Sender 
Privacy" . 

Another important privacy aspect is the secrecy of the data that is retrieved. 
We do not want the server to have information on the fresh data x' that is 
queried; this is expressed by the following condition. 

Condition 7 (Receiver Privacy) The scheme is said to respect Receiver Pri- 

■r ,1 T . £ ;■ ■ ■ 7- -1,7 ■ -iL |- Receiver Privacy 

vacy ij the advantage oj any malicious server is negligible m the txp_^ 
experiment described below. As in the previous condition, A denotes the "honest- 
but-curious" opponent taking the place of S, and C the challenger at the user 
side. 
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This condition is the mirror image of the previous one. It transposes the idea 
that the receiver y can make his queries to S without leaking information on 
their content. The processing of the experiment is the same as the Sender 
Privacy experiment, except that A has to distinguish between Retrieve queries 
instead of Send queries. 

Remark 3 Conditions and are the transposition of their homonym state- 
ment in 161. They aim for the same goal, i.e. privacy - against the server - of 
the data that is registered first, then looked for. 



Section [5] is dedicated to give a construction that fits these security condi- 
tions. 



4 Our Data Structure for Approximate Search- 
ing 



After the recall of the notions of locality-sensitive hashing and Bloom filters, 
we introduce a new structure which enables approximate searching by combin- 
ing both notions. We end this section with the introduction of some classical 
cryptographic protocols. 

In the sequel, we denote [a, b] the interval of all real values between a and b 
(inclusive). 

4.1 Locality-Sensitive Hashing 

We first consider the following problem: 

Problem 1 (Approximate Nearest Neighbour Problem) Given a set P 
of points in the metric space {B,d) pre-process P to efficiently answer queries. 
The answer of a query X is a point px ^ P such thatd{x,Px) < (l-|-e) minpgp (i(a;,p). 

This problem has been widely studied over the last decades; reviews on the 
subject include [53]. However, most algorithms proposed to solve the matter 
consider real spaces over the Ip distance, which is not relevant in our case. A 
way to search the approximate nearest neighbour in a Hamming space is to use a 
generic construction called locality-sensitive hashing. It looks for hash functions 
(not cryptographic ones) that give the same result for near points, as defined in 



Definition 3 ( [30] ) Let B be a metric space, U a set with a smaller dimen- 
sionality, ri,r2 G M with ri < r2, pi,P2 ^ [0,1] with pi > P2- A family H = 
{hi, . . . ,hfj,},hi : B ^ U , is (ri, r2,pi,p2)-LSH (Locality-Sensitive Hashing), if 
for all h e H,x,x' e B, Pr [h{x) = h{x')] > pi (resp. Pr [h{x) = h{x')] < P2) if 
dsixjx') < ri (resp. dB{x,x') > r2). 

Such functions reduce the differences occurring between similar data with 
high probability, whereas distant data should remain significantly remote. 

A noticeable example of a LSH family was proposed by Kushilevitz et al. in 
[37]; see also [MllMlli- 

4.2 Bloom Filters 

As introduced by Bloom in 3J, a set of Bloom filters is a data structure used 
for answering set membership queries. 

Definition 4 Let D be a finite subset ofY. For a collection of v (independent) 
hash functions H' = {h'l, . . . ,h'j^}, with each h'^ : Y {1, . . . , m} , the induced 
(j/, TO)-Bloom filter is H, together with an array {ti, . . . ,tm) G {0, 1}™, defined 
as: 



m 




otherwise 




With this setting, testing if y is in D is the same as checking if for ah 
i G {1, . . . ,i'},th'.(^y) = 1- The best setting for the filter is that the involved hash 
function be as randomized as possible, in order to fill all the buckets ta. 

In this setting, some false positive may happen, i.e. it is possible for all 
th'.{y) to be set to 1 and y ^ D. This event is well known, and the probability 

for a query to be a false positive is: (l— (l — ^) j . 

This probability can be made as small as needed. On the other hand, no 
false negative is enabled. 

We work here with the Bloom filters with storage (BFS) defined in [6] as 
an extension of Bloom filters. Their aim is to give not only the result of the 
set membership test, but also an index associated to the element. The iterative 
definition below introduces these objects and the notion of tags and buckets 
which are used in the construction. 

Definition 5 (Bloom Filter with Storage, [6j) Let D be a finite subset of 
a set Y. For a collection of v hash functions H' = {h'l^ . . . ,h'^}, with each 
h'j : Y ^ {1, . . . ,m}, a set V of tags associated to D with a tagging function 
^ : D V{V). A (i/, m)-Blooni Filter with Storage is H' , together with an 
array of subsets (Ti, . . . , T^) of V , called buckets, iteratively defined as: 

1. ViG{l,...,TO},T, ^0, 

2. G Z?, Vj e {1, . . . , ^}, update the bucket Ta with Tq <— Tq, U "0(2/) where 
a = h'^{y). 

In other words, the bucket structure is empty at first, and for each element 
2/ e D to be indexed, we add to the bucket Ta all the tags associated to y. 
Construction of such a structure is illustrated in Fig. [T] 
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Figure 1: Construction of Bloom Filters with Storage 



Example 1 In Fig. [7J assume that D = {2/1,2/27^3} and v = 3, the tags 
associated to yi (resp. y2) have already been incorporated into the buckets T2, T3 
and Ta (resp. Ti, T2 and T3) so that Ti = {^'(2/2)}, T'2 = T3 = {V-Cyi), -0(2/2)}, 
Ta = {^(yi)} and Ti = 9 otherwise. We are now treating the case of y^: 

• KiVa) soTa^TaU {i^iys)}, «-e- Ta = {V^yi), V'Cya)}; 



• h'^ivs) = 2 so T2 ^ T2 U {ipivs)}, i-e. T2 = {tp{yi),tl^{y2)A{Vi)}; 

• ^(ys) = »7i so T,n ^ Tm U {"^Cys)}, i-G- = {^^(ys)}- 

This construction enables to retrieve a set of tags associated to an element 
y e D: it is designed to obtain ^{y)^ the set of tags associated to y, by com- 
puting nj^i ^/i' (y)- -'^'-"^ instance, in the previous example, 0^=1 ^/t' (ya) = 
T2 n Tq n = {ip{yz)}- This intersection may capture inappropriate tags, 
but the choice of relevant hash functions and increasing their number allow to 
reduce the probability of that event. These properties are summed up in the 
following lemma. 

Lemma 1 ([3j) Let {H',Ti, . . . ,Tm) be a {iy,m)-Bloom filter with storage in- 
dexing a set D with tags from a tag set V . Then, for y £ D, the following 
properties hold: 

• ip{y) C T{y) = fXj^iTfi' (^y), i.e. each ofy's tag is retrieved, 

• the probability for a false positive t Cz V is Pr [t G T(jj) and t ^ ipiy)] = 

4.3 Combining BFS and LSH 

We want to apply Bloom filters to data that are very likely to vary. To this 
aim, we first apply LSH-families as input to Bloom filters. 

We choose hash functions from an adequate LSH family hi, . . . : B ^ 
{0, 1}*, and V hash functions dedicated to a Bloom filter with Storage h[, . . . ,h'^ : 
{0, 1}* X {1, . . . , ^} ^ {1, . . . , to}. The LSH family is denoted H, and H' is the 
BFS one. To obtain a BFS with locality-sensitive functionality, we use composite 
fi X hash functions induced by both families. 

We define /i^^ : i? — > {1, . . . , to} the corresponding composite functions (c 
stands for composite) with h'^^j i){y) — h'j{hi{y) \\ i). Let H'^ ~ {h'^- ■y{j,i) e 
{1, . . . X {1, . . . , fj,}} the set of all these functions. 

To sum up, we modify the update of the buckets in Def. [5]by a = h'^{hi{y) \\ 
i). Later on, to recover tags related to an approximate query x' e B, all 
we have to consider is PlJ^i PliLi ^/i' (fti(£c')il*) ■ I^ideed, if x and x' are close 
enough, then the LSH functions give the same results on a; and x' , effectively 
providing a Bloom filter with storage that has the LSH property. This property 
is numerically estimated in the following lemma: 

Lemma 2 Let H, H' , H'^ be families constructed in this setting. Let x,x' £ B 
be two binary vectors. Assume that H is {\„iin,^max,^i,^2)-LSH from B to 
{0, 1}*; assume that H' is a family of v pseudo-random hash functions. If 
the tagging function tp associates only one tag per element, then the following 
properties stand: 



1. If X and x' are far enough, then except with a small probability, 'ip{x') does 
not intersect all the buckets that index x, i.e. 



Pr i^{x') C and d(x, x') > \ 



'max 



< U2 + (l-e2) — 




2. If X and x' are close enough, then except with a small probability, ipi^') is 
in all the buckets that index x' , i.e. 



Note that this lemma used the simplified hypothesis that x ,\\l){x)\ = 1, 
which means that there is only one tag per vector. This has a direct application 
in Section [5.21 In practice, "^{x) can be a unique handle for x. 

Sketch of proof. The first part of the lemma expresses the fact that if 
d{x, x') > Xmax, due to the composition of a LSH function with a pseudorandom 
function, the collision probability is Indeed, \ih'i{yi) = /i2(2/2), either yi = 2/2 
and h'l = h^, or there is a collision of two independent pseudo-random hash 
function. In our case, if yi = 7/2, that means that yi = hi-^^{x)\\ii and 7/2 = 
hir^{x')\\i2. To these vectors to be the same, ii = 12 and hi-^(x) = hi^{x'), which 
happens with probability £2- 

The second part of the lemma says that for each h'^ E H'^, h'^{x) and h'^{x') 
are the same with probability 1 — ei. Combining the incremental construction 
of the Ti with this property gives the lemma. □ 

4.4 Cryptographic Primitives 

Public Key Cryptosystem Our construction requires a semantically secure 
public key cryptosystem - as defined in |25j , see for instance ^Ul - to store 
some encrypted data in the database. Encryption function is noted Enc and 
decryption function Dec, the use of the keys is implicit. An encryption scheme 
is said to be semantically secure (against a chosen plaintext attack, also noted 
IND-CPA [25]) if an adversary without access to the secret key sk, cannot 
distinguish between the encryptions of a message xq and a message xi. 

Private Information Retrieval Protocols A primitive that enables privacy- 
ensuring queries to databases is Private Information Retrieval protocol (FIR, 
[17] ). Its goal is to retrieve a specific information from a remote server in such a 
way that he does not know which data was sent. This is done through a method 
Query^^J*(a), that allows y to recover the element stored at index a in 5 by 
running the PIR protocol. 

Suppose a database is constituted with M bits X — xi, xm- To be secure, 
the protocol should satisfy the following properties [23] : 

• Soundness: When the user and the database follow the protocol, the 
result of the request is exactly the requested bit. 



Pr Tp{x) Pi Thc(^) anrf d(a;,a;') < Ami,! < 1 - (1 - ei) 



• User Privacy: For all X G {0,1}^, for 1 < i,j < M, for any algo- 
rithm used by the database, it cannot distinguish with a non-negligible 
probability the difference between the requests of index i and j. 

Among the known constructions of computational secure PIR, block-based 
PIR - i.e. working on block of bits ~ allows to efficiently reduce the cost. The 
best performances are from Gentry and Ramzan '22] and Lipmaa [38| with a 
communication complexity polynomial in the logarithm of M. Surveys of the 
subject are available in [2Tll40j . 

Some PIR protocols are called Symmetric Private Information Retrieval, 
when they comply with the Data Privacy requirement ^23]. This condition 
states that the querier cannot distinguish between a database that possesses 
only the information he requested, and a regular one; in other words, that the 
querier do not get more information that what he asked. 

Private Information Storage (PIS) Protocols PIR protocols enable to 
retrieve information of a database. A Private Information Storage (PIS) pro- 
tocol [10] is a protocol that enables to write information in a database with 
properties that are similar to that of PIR. The goal is to prevent the database 
from knowing the content of the information that is being stored; for detailed 
description of such protocols, see p,l4T]. 

Such a protocol provides a method update(tia^, index), which takes as input 
an element and a database index, and puts the value val into the database 
entry index. To be secure, the protocol must also satisfy the Soundness and 
User Privacy properties, meaning that 1. updateep does update the database 
with the appropriate value, and 2. any algorithm run by the database cannot 
distinguish between the writing requests of {vali,indi) and {valj,indj). 

5 Our Construction for Error- Tolerant Search- 
able Encryption 

5.1 Technical Description 

Our searching scheme uses all the tools we described in the previous section. 
As we will see in section 15.21 this enables to meet the privacy requirements of 
section [331 More precisely: 

• We pick a family H' of functions: h' : {0, 1}* x {1, . . . , {1, . . . ,m}, 
adapted to a Bloom filter structure, 

• We choose a family H of functions: h : {0, 1}^ — * {0, 1}* that have the 
LSH property, 

• From these two families, we deduce a family H'^ of functions h'^ : {0, 1}^ 
{!,..., to} as specified in Sec. 14. 3[ 



• We use a semantically secure public key cryptosystem (Setup, Enc, Dec) 

• We use a PIR protocol with query function Query^^J^. 

• We use a PIS function updateBf{val,i) that adds val to the i-th bucket of 
the Bloom filter, see Sec. 14.41 

Here come the details of the implementation. In a few words, storage and 
indexing of the data are separated, so that it becomes feasible to search over 
the encrypted documents. Indexing is made thanks to Bloom Filters, with an 
extra precaution of encrypting the content of all the buckets. Finally, using our 
locality-sensitive hashing functions permits error-tolerance. 

5.1.1 System setup 

The method KeyGen(l'^) initializes m different buckets to 0. The public and 
secret keys of the cryptosystem {pk,sk) are generated by Setup(l'''), and sk is 
given to y. 

5.1.2 Sending a message 

The protocol Sendx,s{x,pk) goes through the following steps (cf. Fig. [21): 

1. Identifier establishment S attributes to a; a unique identifier (p{x), and 
sends it to X. 

2. Data storage X sends Enc(a;) to S, who stores it in a memory cell that 
depends on 1^9(2;). 

3. Data indexing 

• X computes h'^{x) for all ft,"^ e H'^, 

• and executes updateBF(Enc((y9(x)), /i^(a;)) to send Enc{(p{x)) to be 
added to the filter's bucket of index h''{x) on the server side. 

Note that for privacy concerns, we complete the buckets with random data in 
order to get the same bucket size I for the whole data structure. 

The first phase (identifier establishment) is done to create an identifier that 
can be used to register and then retrieve x from the database. For example, 
^p{x) can be the time at which S received x, or the first memory address that is 
free for the storage of Enc(a;). 

The third phase applies the combination of BFS and LSH functions (see Sec. 
14. 3p to X so that it is possible to retrieve x with some approximate data. This 
is done with the procedure described hereafter. 




Figure 2: Sending a message in a nutshell 



5.1.3 Retrieving data 

The protocol Retr\e\/ey^s{x' , sk) goes through the following steps (of. Fig. [31): 

1. y computes each = h1(x') for each e then executes Query^^J^(ai) 
to receive the filter bucket Tq . , 

2. 3^ decrypts the content of each bucket T^. and computes the intersection 
of aU the Dec(T„J, 

3. This intersection is a set of identifiers {(^(x^j), . . . ,1^(2;^^)}, which is the 
result of the execution of Retrieve. 




Figure 3: Retrieving data in a nutshell 



As we can see, the retrieving process follows that of Sec. 14.31 with the no- 
ticeable differences that 1. the identifiers are always encrypted in the database, 
and 2. the query is made following a PIR protocol. This permits to benefit 
from both the Bloom filter structure, the locality-sensitive hashing, and the 
privacy-preserving protocols. 

The secure protocols involved do not leak information on the requests made, 
and the next section discusses more precisely the security properties achieved. 



5.2 Security Properties 



We now demonstrate that this construction faithfully achieves the security re- 
quirements we defined in Sec. 13.31 

Proposition 1 (Completeness) Provided that H is a {Xmin, ^max, ei, e2)-LSH 
family, for a negligible ei, this scheme is complete. 

Proposition 2 (e-Soundness) Provided that H is a {Xmin, Knax, ^i, ^2)-LSH 
family from {0, 1}^ to {0, 1}*, and provided that the Bloom filter functions H' 
behave like pseudo-random functions from {0, 1}* x {1, . . . , |} to {1, . . . , m}, 
then the scheme is e-sound, with: 



Propositions [T] and [5] are direct consequence of Lemma [21 

Remark 4 Proposition\^ assumes that the Bloom filter hash functions are pseudo- 
random; this hypothesis is pretty standard for Bloom filter analysis. It can be 
achieved by using cryptographic hash functions with a random oracle-like be- 
haviour. 

Proposition 3 (Sender Privacy) Assume that the underlying cryptosystem 
is semantically secure and that the PIS function update^^ achieves User Pri- 
vacy, then the scheme ensures Sender Privacy. 

Proof. If the scheme does not ensure Sender Privacy, that means that there 
exists an attacker who can distinguish between the output of Send(a::o,pA:) and 
Send(a:i,pA;), after the execution of Send{xi,pk), i € {2, . . . , fl}. 

Note that the content of the Bloom filter buckets does not reveal information 
that can permit to distinguish between xq and xi. Indeed, the only information 
A has with the filter structure is a set of Enc{(p(xi)) placed at different indexes 
h'^{xi), i = e, 2, . . . , f2. Thanks to the semantic security of Enc, this does not 
permit to distinguish between ip{xq) and (p(xi). 

This implies that, with inputs Enc(a;i), updateBF(Enc((/5(a;i)), h'^{xi)) ( for i > 
2), the attacker can distinguish between Enc(xo), updateBF(Enc((^(a;o)), ft,'^(a;o)) 
and Enc(a::i), updateBF(Enc(</?(a;i)), /i'^(a;i)). 

As updatesF does not leak information on its inputs, that means that the 
attacker can distinguish between Enc(a;o) and Enc(xi) by choosing some other 
inputs to Enc. That contradicts the semantic security assumption. □ 

Proposition 4 (Receiver Privacy) Assume that the PIR ensures User Pri- 
vacy, then the scheme ensures Receiver Privacy. 



This property is a direct deduction of the PIR's User Privacy, as the only 
information S gets from the execution of a Retrieve is a set of Query^^^. □ 

These properties show that this protocol for Error- Tolerant Searchable En- 
cryption has the security properties that we looked for. LSH functions are used 
in such a way that they do not degrade the security properties of the system. 




Proof. 



6 Application to Identification with Encrypted 
Biometric Data 



6.1 Our Biometric Identification System 

We now apply our construction for error-tolerant searchable encryption to our 
biometric identification purpose. Thanks to the security properties of the above 
construction, this enables us to design a biometric identification system which 
achieves the security objectives stated in Section [^31 

While applying the primitives of error-tolerant searchable encryption, the 
database VB takes the place of the server S; the role of the Identity Provider 
TV varies with the step we are involved in. During the Enrolment step, TV 
behaves as X, and as y during the Identification step. In this step, TV is in 
possession of the private key sk used for the Retrieve query. 

6.1.1 Enrolment 

• To enrol a user Ui , the sensor SC acquires a sample bi from his biometrics 
and sends it to TV, 

• The Identity Provider TV then executes Sendx.s{bi,pk). 

6.1.2 Identification 

• SC captures a fresh biometric template b' from a user U and sends it to 
TV,. 

• The Identity Provider TV then executes Retrievej;_5(6', sk). 

At the end of the identification, TV has the fresh biometric template 6' along 
with the address of the candidate reference templates in VB. To reduce the list 
of identities, we can use a secure matching scheme [T^llll] to run a final secure 
comparison between b' and the candidates. 

6.2 Practical Considerations 

6.2.1 Choosing the LSH family: an Example 

Let's place ourself in the practical setting of human identification through iris 
recognition. A well-known method to doing so is to use Daugman's IrisCode [TB]. 
This extracts a 2048-bit vector, along with a "mask", that defines the relevant 
information in this vector. Iris recognition is then performed by computing a 
simple Hamming distance; vectors that are at a Hamming distance less than a 
given threshold are believed to come from the same individual, while vectors 
that come from different eyes will be at a significantly larger distance. 

There are several paths to design LSH functions adapted to this kind of 
data. Random projections such as those defined in [37], is a convenient way to 
create LSH functions for binary vectors. However, for the sake of simplicity, we 



propose to use the functions used in in which they are referred as 'beacon 
indexes'. These functions are based on the fact that all IrisCode bits do not 
have the same distribution probability. 

In a few words, these functions first reorder the bits of the IrisCode by rows, 
so that in each row, the bits that are the most likely to induce an error are 
the least significant ones. The column are then reordered to avoid correlations 
between following bits. The most significant bits of rows are then taken as 10-bit 
hashes. The efficiency of this approach is demonstrated in [55] where the authors 
apply these LSH functions to identify a person thanks to his IrisCode. They 
interact with the UAE database which contains N — 632500 records; trivial 
identification would then require about N/2 classical matching computation, 
which is way too much for a large database. Instead, they apply fi — 128 of 
those hashes to the biometric data, and look for IrisCodes that get the same 
LSH results for at least 3 functions. In doing this, they limit the number of 
necessary matching to 41 instead of N. 

To determine the LSH capacity of these hash functions is not easy to do 
with real data; however, if we model b and 6' as binary vectors such that the 
each bit of b is flipped with a fixed probability (i.e. if b' is obtained out of 
b through a binary symmetric channel), then the family induced is (r'i,r2, 1 — 
~ 2543)^°' ~ ^s^^^y^^^' "^^^^ estimation is conservative as IrisCodes are 
designed for biometric matching. 

Combining these functions with a Bloom filter with storage in the way de- 
scribed in Sec. 14.31 enables to have an secure identification scheme. 

6.2.2 Overall complexity and efficiency 

We here evaluate the computational complexity of an identification request on 
the client's side. We note K{op) the cost of operation op, and \S\ the size of the 
set S. Recalling Section [5T| the overall cost of a request is: 

ft; (request) 

= |_H"'|(K(hash) -I- k{PIR) + lT|K(Dec)) + ^(intersection) 

< IH^l {k (hsF) + AC (hisff) + K (PIR) + |r|K (Dec)) + 0(|T||if"|) 

We here used data structures in which intersection of sets is linear in the set 
length, hence the term OdTlji/'^l); |T| is the maximum size of a Bloom filter 
with storage bucket. 

To conclude this complexity estimation, let us recall that the cost of a hash 
function can be neglected in front of the cost of a decryption step. The PIR 
query complexity at the sensor level depends on the scheme used (recall that 
the PIR query is made only over the set of buckets and not over the whole 
database); in the case of Lipmaa's PIR 38J, this cost k{PIR) is dominated by 
the cost of a Damgard-Jurik encryption. The overall sensor complexity of an 
identification request is 0(/ii^(|T|K(Dec) + k{PIR))). 



7 Conclusion 



This paper details the first non-trivial construction for biometric identification 
over encrypted binary templates. This construction meets the privacy model 
one can expect from an identification scheme and the computation costs are 
sublinear in the size of the database. 

We studied identification scheme using binary data, together with Hamming 
distance. We plan to extend our scope to other metrics. A first lead to follow is 
to use techniques from [37] which reduce the problem of ANN over Euclidean 
spaces into ANN over a Hamming space. 
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A Achieving Symmetric Receiver Privacy 



In this section, we introduce a new security concern, which we call Symmetric 
Receiver Privacy. 

A.l Condition statement 

This property aims at limiting the amount of information that y gets through 
the protocol. Indeed, if previous constructions of Searchable Encryption such 
as [51 [23] seem to consider that the sender and the receiver are the same person, 
thus owning the database in the same way, there are applications where the 
receiver must not dispose of the entire database. If for example different users 
yi have access to the application, we do not want user yi to obtain information 
on another user yj 's data. 

For this purpose, we define a database simulator Si. Si{x') is a simulator 
which only knows the tags of the registered elements that are in ^{x'), while 
the other elements are random. Here, x' stands for the message to be retrieved. 
On the other hand, Sq is the regular server, which genuinely runs the protocol. 

Condition 8 (Symmetric Receiver Privacy) The scheme is said to respect 
Symmetric Receiver Privacy if there exists a simulator Si such that the advan- 
tage of any malicious receiver is negligible in the Exp^^™'^*^'^"^""'"^^ experiment 
described below. Here, A is the 'honest-but- curious' opponent taking the place 
of y , and C the challenger at the server side. 

ESym-Rec- Privacy 
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The advantage of the adversary zs | Pr [e' = e] — ^ | . 

This new condition does not fit into previous models for Searchable Encryp- 
tion, and is not satisfied by constructions such as [6l[23]- It is inspired by the 
Data Privacy property of SPIR protocols, which states that it is not possible to 
tell whether or not S possesses more data than the received messages. Indeed, 
if the receiver is able to tell the difference between a server Sq that possess 
more data than what y received, and a server Si that just has in memory the 
information that y needs, then y detains more information than what he ought 
to; that is why this indistinguishability game fits the informal description of 
Symmetric Receiver Privacy. 

Section IA.3I is dedicated to give a construction that also fits this security 
conditions. 



A.2 Specific Tools 



ElGamal For this purpose, we specify a second cryptosystem {Setup, £nc, 'Dec) 
to be that of ElGamah let G he a cychc group of order q, a large prime, with g a 
generator; let / be another generator of Q. Setup renders the key pair (h — , v) 
for V a random integer. Encryption £nc takes a random value r, and computes 
£nc{x) = {g^,h^x). The value Pec j/2)) — — ^ be computed thanks 
to the secret key v. The homomorphic property is 'Dec{£nc{x)£nc(x')) — xx' . 

Secret splitting Let s be a small secret; we wish to split s into n re- 
randomizable parts. There is a general technique for this, called Proactive 
Secret Sharing [351[2H]j but for clarity reasons, we propose a simple technique 
for this. We construct n shares Ai, . . . , An such that Ai = g"^' where is a ran- 
dom integer, for i G {1, . . . , n — 1} and An = 5" ^ where g is the generator 
of a group of large prime order q. Recovering s can be done by multiplying all 
the Ai, and then proceeding to an exhaustive search to compute the discrete 
logarithm of 5* in basis g. Re-randomization of the parts Ai can easily be done 
by choosing a random integer t, and replacing each Ai by A*. The generator for 
the discrete logarithm must then be replaced by g*. 

A. 3 Extending our Scheme 

The scheme proposed in Sec. IS.ll does not achieve Symmetric Receiver Privacy. 
For example, the user y has access to all the ip{xi) such that there exists h'^, /i§ G 
H'^,h'^{xi) = hQ{x'). Without further caution, a malicious user could get more 
information than what he ought to. We here describe an example of a protocol 
variant that leads to the desired properties. 

We will apply secret splitting to the tags (p{x) returned by Send. That 
implies that we consider the range of ip{x) to be relatively small, for example of 
32-bit long integers. Primitives are adapted this way: 

• KeyGen(l'^) is unchanged, but here both Setup and Setup are used to 
generate {pk, sfc), 

• Sendx.si^iPk) is slightly modified, namely: 

1. Identifier establishment (unchanged) S attributes to x a unique 
identifier f{x), and sends it to X. 

2. Data storage (unchanged) X sends Enc(a;) to S, who stores it in 
a memory cell that depends on ^{x). 

3. Data indexing 

— X splits the tag ^p{x) into \H'^\ shares A^.i, ■ ■ ■ , A^ ^H"] thanks 
to the method described above, and picks a random integer Tx, 

— X computes all h1(x), and executes the queries 



updztesF{{£nc{r''),£nc(Ax^{)),h1{x)) 



to send {£nc{f^^),Enc{Ax,i)) to be added to the filter's bucket 
of index h1{x), where hi is the i-th function of H'^, for i G 
{!,..., 

At the end of this update, the bucket Tq, of the filter is filled with I couples 
{£nc{f^''-'),£nc{Baj)),j £ {1, ■ • ■ ,0- ^a.j is a share of some tag, or a 
random element of the group. 

• Retrieve^; .5(2;', s/c) is adapted consequently: 

1. (unchanged) y computes each Ui — h1{x') for hi G H'^, then exe- 
cutes Query^;^/(ai), 

2. — 5 first re-randomizes the content of each bucket of the Bloom 

filter database by the same random value. The filter bucket 
= {{£nc{f^°'i'^),£nc{Baij)),j £ {1,...,^}} becomes 

ni'^' = {i£nc{r-^-r\£nc{B^^,,r),j G {1, . . .,1}} 

— S then answers to the PIR Query, and sends along g'^^ to y, 

3. y decrypts the content of each bucket T^^^^^ to get a set of couples 

4. If the same element /^"^^ is present in the intersection of all the dif- 
ferent sets then 3^ possesses all shares of a tag (p{x), and then 

computes uti = i9^'V^"\ 

5. y finally runs a discrete logarithm of (g^^)'''^^'' in basis 5'^^, and adds 
f{x) to the set of results 

Note that this scheme can also be generalized for other Proactive Secret 
Sharing schemes. 

A. 4 Security Properties 

This new scheme is an extension of the previous one, and the same security prop- 
erties are achieved. Moreover, Condition [5] also holds. Indeed, the modification 
to the Send procedure is not significant enough to alter the Sender Privacy prop- 
erty: the only modification on iS's side is the content of the updateBp procedure, 
which does not leak. Moreover, the Receiver Privacy property is also preserved, 
as communications from 3^ to 5 in Retrieve only involves a PIR query. 

Proposition 5 (Symmetric Receiver Privacy) Assume the PIR ensures Data 
Privacy i.e. it is a SPIR, and that H is a {X„iin,^max,^i,^2)-LSH family with 
a negligible 62, then the scheme ensures Symmetric Receiver Privacy, over the 
Decisional Diffie Hellman hypothesis. 



To demonstrate this proposition, let us begin with a preliminary Lemma. 



Lemma 3 Let si, . . . , St € S het different secrets, with \ S\ small. Let . . . , Ai^. 
he the n parts of the secret Si split thanks to the method described in Sec. [21 Let 
TTo C {Al ^,i e {1, . . . , i}, j e {1, . . . , n}, c e {l, . . . , g}} he collection of k such 
parts, and tti = {g^^, . . . ,3'^''} a set of k random elements of the cyclic group Q. 

Under the DDH assumption, if an adversary A can distinguish between ttq 
and TTi, then there exists cq G {1, . . . , g}, i G {1, . . . , t} such that {A1\, . . . , A1°^] C 
ttq. 

Sketch of proof 

Let [g, g", g^ , g'^) be an instance of the DDH problem. An adversary can solve 
this instance if he can tell, with non-negligible probability, whether g'^ — g"-'^ or 
not. 

We take t — 1, because all secrets are independent, and n = 2 (it is easy to 
take n > 2 by multiplying the parts and returning to the case n — 2). Suppose 
the lemma is false, that means that there exists a polynomial algorithm A that 
takes as inputs couples {g''",g'^"^) and (ff'^" , g'^*'^''"'"^), with c„ 7^ c„, and that 
returns the secret s with non-negligible probability. 

We then give as input to A the couples {g,g'^) and {g'' , g'"^~'^) for s & S. If 
A returns s, that means that g'^~'"^ = g^^°'~'^^ = g^°-g~^'^. We finally have an 
advantage on the DDH problem; that proves the lemma. □ 

Proof of Proposition 

We now build a simulator Si for the server in order to prove the propo- 
sition. Let x' be the request and $(a;') = {Lp{xi), . . . ,ip{x]^)} be the genuine 
answer to Retrieve(a;', s/c). First, the simulator generates random elements 
{zi, . . . , zji} C {1, . . . , g}; he associates the first k elements to the elements 
of $(a;'). The simulator splits each of the ^pixj) into the n = \H'^\ parts 
Axj,i, . . . , Axj,n. Finally, he picks random integers ci, C2. 

Since the PIR is symmetrical, we can impose the response to each Query(Q;i) 
to be a set containing the k elements that must be present in the intersection, 
namely {^£nc{f^^Y^,£nc{Axj.iY^), and the remaining random values 

{£nc{rr\£nc{g'-)), 

with z a random element of {zk+i, . . . , zsi} and r a random integer. We give 
to the simulator enough memory to remember which z was returned for which 
a, so that multiple queries to the same a are consistent. The simulator also 
returns g"^^ . 

Let ^ be a malicious receiver in the ^xp^'""'^'"^'^'^'™'^'' experiment. Following 
Cond. [51 A makes p Retrieve queries to S; each of these requests lead to \LP\ 
calls to Query. As the requests x'^ are Amaa;-separated, and as the hashes are 
^mini \naxi ^ii ^2 with a negligible 62, we can consider these Retrieve queries to 
be independent. 

Note that the first parts of the Bloom filters are always indistinguishable, as 
they are generated in the same way. Therefore, if A distinguishes between So 
and Si , that means that he distinguished between a given ttq and tti , constructed 
by taking the set of all answers to the Query request he made. By application 
of the Lemma, we deduce the proposition. 



